Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update vector search docs #18779

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

qiancai
Copy link
Collaborator

@qiancai qiancai commented Sep 2, 2024

First-time contributors' checklist

What is changed, added or deleted? (Required)

This PR moves 15 vector search docs from the tidb-cloud folder to the vector-search folder to so they can be reused by TiDB self-managed docs.

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v8.4 (TiDB 8.4 versions)
  • v8.3 (TiDB 8.3 versions)
  • v8.2 (TiDB 8.2 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 2, 2024
@qiancai qiancai added the translation/no-need No need to translate this PR. label Sep 2, 2024
@ti-chi-bot ti-chi-bot bot removed the missing-translation-status This PR does not have translation status info. label Sep 2, 2024
@qiancai qiancai changed the base branch from master to v8.4-vector-search September 2, 2024 09:54
@qiancai qiancai self-assigned this Sep 2, 2024
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 3, 2024
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 3, 2024
Copy link

ti-chi-bot bot commented Sep 3, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-03 02:39:01.388127072 +0000 UTC m=+325665.906179996: ☑️ agreed by Oreoxmt.

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 3, 2024
@qiancai qiancai changed the base branch from v8.4-vector-search to master September 3, 2024 07:00
@qiancai qiancai added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 3, 2024
@qiancai qiancai changed the title reuse vector search docs as a base update vector search docs Sep 3, 2024
Copy link

ti-chi-bot bot commented Sep 3, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from qiancai, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Oreoxmt
Copy link
Collaborator

Oreoxmt commented Sep 3, 2024

/approve cancel

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
Copy link

ti-chi-bot bot commented Sep 6, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@qiancai qiancai added v8.4 This PR/issue applies to TiDB v8.4. translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. labels Sep 14, 2024
@qiancai qiancai removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2024
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 20, 2024
@Oreoxmt Oreoxmt self-requested a review September 24, 2024 03:58

> **Warning:**
>
> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
Copy link
Member

@breezewish breezewish Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
> The vector search feature is experimental and some behaviors may change in future versions. It is not recommended that you use it in the production environment. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.

Remove "This feature might be changed or removed without prior notice." as Vector Search is experimental because of stability issues, not product decision issues. Vector Search will never be removed.


> **Warning:**
>
> The vector search feature is in beta. It might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> The vector search feature is in beta. It might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
> The vector search feature is in beta and some behaviors may change in future versions. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.

CREATE TABLE foo (
id INT PRIMARY KEY,
data VECTOR(5),
data64 VECTOR64(10),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data64 VECTOR64(10),

We do not support this syntax.

Copy link

ti-chi-bot bot commented Sep 27, 2024

@qiancai: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-verify 4aa3caf link true /test pull-verify

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Comment on lines +127 to +130
## From v7.x to v8.4 or a later version

Starting from v8.4, the underlying storage format of TiFlash has been updated to support the [vector search](/vector-search-overview.md). Therefore, after the upgrade TiFlash to v8.4 or a later version, in-place downgrading to the original version is not supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I've pushed a commit about the tiflash upgrade notice

Comment on lines +24 to +26
TiDB currently supports the following vector search index algorithm:

- HNSW
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TiDB currently supports the following vector search index algorithm:
- HNSW
TiDB currently supports the [HNSW (Hierarchical Navigable Small World)](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) vector search index algorithm.

- TiFlash nodes must be deployed in your cluster in advance.
- Vector search indexes cannot be used as primary keys or unique indexes.
- Vector search indexes can only be created on a single vector column and cannot be combined with other columns (such as integers or strings) to form composite indexes.
- A distance function must be specified when creating and using vector search indexes (currently, only cosine distance `VEC_COSINE_DISTANCE()` and L2 distance `VEC_L2_DISTANCE()` functions are supported).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A distance function must be specified when creating and using vector search indexes (currently, only cosine distance `VEC_COSINE_DISTANCE()` and L2 distance `VEC_L2_DISTANCE()` functions are supported).
- A distance function must be specified when creating and using vector search indexes. Currently, only cosine distance `VEC_COSINE_DISTANCE()` and L2 distance `VEC_L2_DISTANCE()` functions are supported.

>
> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.

</CustomContent>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<CustomContent platform="tidb-cloud">


> **Note:**
>
> Vector search index is only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters.
> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
</CustomContent>

- A distance function must be specified when creating and using vector search indexes (currently, only cosine distance `VEC_COSINE_DISTANCE()` and L2 distance `VEC_L2_DISTANCE()` functions are supported).
- For the same column, creating multiple vector search indexes using the same distance function is not supported.
- Deleting columns with vector search indexes is not supported. Creating multiple indexes in the same statement is not supported.
- Setting vector search indexes as [invisible](/sql-statements/sql-statement-alter-index.md) is not supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补上了这条:

  • 不支持修改带有向量索引的列的类型(有损变更,即修改了列数据)。
Suggested change
- Setting vector search indexes as [invisible](/sql-statements/sql-statement-alter-index.md) is not supported.
- Modifying the type of a column with a vector index is not supported (lossy change, that is, column data is modified).
- Setting vector search indexes as [invisible](/sql-statements/sql-statement-alter-index.md) is not supported.


ALTER TABLE foo ADD VECTOR INDEX idx_name ((VEC_COSINE_DISTANCE(data))) USING HNSW;
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这注意建议删掉,在特性改为 GA 时很容易把此处漏改。L90 还有一处。

@@ -95,15 +156,15 @@ SELECT * FROM
) t
WHERE category = "document";

-- Note that this query may return less than 5 results if some are filtered out.
-- Note that this query might return less than 5 results if some are filtered out.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Note that this query might return less than 5 results if some are filtered out.
-- Note that this query might return fewer than 5 results if some are filtered out.

@@ -163,9 +251,11 @@ SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES;

For more information, see [`ALTER TABLE ... COMPACT`](/sql-statements/sql-statement-alter-table-compact.md).

In addition, you can monitor the execution progress of the DDL job by executing `ADMIN SHOW DDL JOBS;` and checking the `row count`. However, this method is not fully accurate, because the `row count` value is obtained from the `rows_stable_indexed` field in `TIFLASH_INDEXES`. This approach can used as a reference for tracking the progress of indexing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition, you can monitor the execution progress of the DDL job by executing `ADMIN SHOW DDL JOBS;` and checking the `row count`. However, this method is not fully accurate, because the `row count` value is obtained from the `rows_stable_indexed` field in `TIFLASH_INDEXES`. This approach can used as a reference for tracking the progress of indexing.
In addition, you can monitor the execution progress of the DDL job by executing `ADMIN SHOW DDL JOBS;` and checking the `row count`. However, this method is not fully accurate, because the `row count` value is obtained from the `rows_stable_indexed` field in `TIFLASH_INDEXES`. You can use this approach as a reference for tracking the progress of indexing.


- `<HOST>`: The host of the TiDB cluster.
- `<PORT>`: The port of the TiDB cluster.
- `<USER>`: The username to connect to the TiDB cluster.
Copy link
Collaborator

@hfxsd hfxsd Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文版是 <USERNAME>,需要保持一致


- `<HOST>`: The host of the TiDB cluster.
- `<PORT>`: The port of the TiDB cluster.
- `<USER>`: The username to connect to the TiDB cluster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文版为 <USERNAME>


- `<HOST>`: The host of the TiDB cluster.
- `<PORT>`: The port of the TiDB cluster.
- `<USER>`: The username to connect to the TiDB cluster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文版是 <USERNAME>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. translation/no-need No need to translate this PR. v8.4 This PR/issue applies to TiDB v8.4.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants